After loading the dataset into R Programming, I realized the magnitude of the data I am analyzing. There are 687,253 objects of 18 variables, 74 countries, 304 different types food, and 60 currencies. With so many different objects I needed to clean the data before I started my analysis. I kept my cleaning phase as broad as possible by retaining as many objects as possible.
One think to note about thus data set is most of the data is from Africa.
For ease of viewing, I renamed columns and removed ones that would not be useful. Originally I wrote a function for a year and month column to merge and make a date column, but I retained both year and month columns. For a country code column, I downloaded an existing data frame and performed a left join on the top so every country had its appropriate code. This also created a region and income group column. I performed the same join to categorize food groups such as fruits, meat, and bread. I found some entries were not food such as fuel and manual labor. I left them in but grouped them as “Not Food.” The hardest part of the cleaning was unifying the price because I had 60 unique currencies over the last two decades. Not only was the exchange rate a factor, so was inflation and deflation. To overcome this I downloaded a data set of Purchasing Power Parities (PPP) conversion factors from the World Bank. This accounted for both inflation and exchange rate. Once I had the PPP data set I wrote a function to match every country and year with its appropriate PPP factor. Once I had a column with PPP factor, I then calculated a unified price column by dividing the original price by the new PPP factor column. Unfortunately the PPP data set did not have a PPP factor for every country and every year, so these entries were deleted.
I thought my analysis could begin, but I next had to unify the unit of measurement column first. Foods were measured differently in kilograms, grams, pounds, gallons, milliliters, liters and even Haitian marmite and Argentinian cuartilla. I converted everything to gram or liter measurement units. I divided the unified price by units to calculate a price per one unit column.
To narrow my focus I looked at the most common foods. This revealed that the top 10 most common foods in order were maize, millet, sorghum, rice (imported), rice, maize (white), rice (local), wheat, sugar and wheat flour. I had a substantial amount of entries, but I wanted to unify all types of the same food. I regrouped the foods to be more specific. I still had more values in my top ten, so I decided to focus of foods with over 25,000 entries. I finally had five foods: Rice, Maize, Sorghum, Beans and Millet. I did not rename them because I wanted to be able to distinguish between local and imported foods.
I grouped the top five foods into five data frames and the first thing I tried was plot a line chart and noticed something strange. It was not a traditional line graph so I investigated my data and realized that for some countries I only had the national ?PRICE? average and others I had multiple markets ?PRICES?. This meant I had multiple prices for the same day that would not graph properly. I went back to cleaning. I wrote a function that would group calculate the national average for each of the ?PRICES?? in the top fives food group data frames. Note this is not a true representation of a national average because I assumed that all the markets are weighted equally which, in reality, is not true. I additionally wrote a function to calculate import vs. local, regional and global averages.
I started the analysis with the food that had the most entries which was rice with 67,003 entries. With each food I started the analysis with a worldwide view and then narrowed it down to a six regioins and then narroed it further to specific countries.
The overal price for rice has increased in the last decade. Additioanlly there a dramtic spike in the price from 2006 - 2007.
As for the inflation of rice there is normal distribution which means the overall price of rice is stable. There are a few outliers which are cause by a few specific countries that have high inflation
In order to investigate the spikes from 2006 - 2007 a line chart with all regions plotted is apropietate.
You can see that a East Asia & Pacific, South Asia, Latin America & Caribbean and finally Middle East & North Africa all seem to have a similary trend. While Europe & Central Asia has a slight higher price. Most notably is Sub-Saharan Africa which has multiple spikes in price which is most likely the casue of the the spikes in 2006. Unfortunately you can see that we do not have a complete time series for all regions.
After a closer look at each region, it is easy to see that Sub-Saharan Africa had volatile price increase. Sub-Saharan Africa is most prone to food price volatility out of all the regions due to shortages, poverty and political conflict. Sub-Saharan Africa is why there were two spikes when all regions were plotted together.
In East Asia & Pacific the price drops for a few years but then picks up again. The price drop and slow increase could be due to the economies of certain countries, population changes, and technology of harvesting. Similarly a price drop and increase exists in the Middle East & North Africa, but I do not have enough data to see if this was a trend. South Asia has the most controlled price since over 90% of the global rice is produced in the Asia-Pacific Region. The price of rice is stable for people in these countries. (http://www.fao.org/docrep/003/x6905e/x6905e04.htm)
Price only tells half of the story. Inflation for each region will show which regions are stable and unstable. To do this I created a box plot.
As expected Sub-Saharan Africa is the most unstable region with the most significant outliers. After investigating the data I found that these outliers are caused by Liberia in 2006 and Rwanda in 2015. Sub-Saharan Africa has the most high and inconsistent prices probably due to domestic and global pressures contributing to inflation.
Latin America & Caribbean has the smallest quartile range, which means it is one of the most stable regions for rice price. Fluctuation of rice production may not impact the rice price as much as other regions since wheat, maize and beans highly supplement this population’s diet. I also found that within the last ten years Latin America & Caribbean benefited from a growing economy and is trying to maintain stability.
Both Liberia and Nigeria have extremely high prices. The cause for the high prices is likely due to the the aftermath of the second Libeiran Civil Wat which ended in 2003 in addition to facing political curoption
In order to look at inflation using a box plot is good to compare many countries.
Histograms are usfeul to take a closer look at each country.
With so many rice values, I have enough rice classified as import and local. I can compare the price of local and imported rice to see if there is a correlation between the two.
In every country, imported local and not listed follow the same trend with just a slight increase or decrease in price. The purple shows that they have the exact same price.
Does inflation for import and export have a significant difference?
Chad and Mali have an identical distribution for both Import and Not Listed which leads me to believe that rice I categorized as Not Listed may be imported rice. Additionally for both these countries Local rice has a much less stable price, which makes sense because of seasonal crops. Mali has three growing seasons, main season Oct-Dec, off-season Dec-Jan and deepwater rice May-July. (http://ricepedia.org/mali). Then Chad has two seasons, main season Oct - Dec and off-season June - July. (http://www.fao.org/docrep/005/Y4347E/y4347e0f.htm). Off season rice has to be grown in well irrigated areas.
The last analysis I will do for rice is the price to survive. Which is a price for a years worth of rice which is equiliveny to 1000 calories of rice a day. The calculated that 1000 calories of rice is about 769.23076923 G.
price_to_survive_plot_line(rice_price_to_survive) price_to_survive_plot_bar(rice_price_to_survive)
price_to_survive_plot_line(rice_price_to_survive_no_lib_nig)#Took out nigeria and liberia price_to_survive_plot_bar(rice_price_to_survive_no_lib_nig) #```
The next food in the analysis is maize which is the second most frquent food in the dataset.
Unlike rice there is not a significant increase or decrease in the price. Though there is a large spike from 2002 to 2003.
As for inflation there is a normal distrobution similar to rice meaning the overall price for maize has been stable for the last decade.
Since maize is not as universal of a food as rice we do not have data for all regions. Additionaly there is not a consistency of time for each region. For Sub-Saharran Africa there is an almost identical spike from 2002 to 2003 and is the cause of the world wide spike the the firs maize plot. This spike was cause by a Southern African drought that lasted from 2002 to about 2005. The crisis affected mainly Malawi, Zambia, Lesotho, Zimbabwe, Swaziland and part of Mozambique (World Health Organization (WHO), 4 Aug 2002: http://www.africanwater.org/drought_crisis_2002.htm). There is data for all six of these countries Additionally the World Food Program (WFP) estimates that more than 2.6 million people were affected by the food security crisis (http://pdf.usaid.gov/pdf_docs/Pnacp289.pdf).
***Europe & Central Asia appear to follow same trend (why)
Short summary
Countr Price Matrixs
Countries with a notacibly higher price for maize are South Sudan, Nigeria and Guinea-Bissau. After 2006 Guinea-Bissau saw a drastic decrease in harvest yeild which would account for the higher prices(https://knoema.com/FAOPRDSC2016R/production-statistics-crops-crops-processed?country=1000860-guinea-bissau&item=1000920-maize). The reason for the descrease is likely due to climate change. The high prices in South Sudan can be justified by famines that have affected south sudan since 2001. Currently South Sudan is recovering from a famine that hit early 2017. In addition conflict between rebels and the governmet was in action from 2003 to 2005, this was known as The Darfur conflict(http://www.waterforsouthsudan.org/brief-history-of-south-sudan/).
By looking at the inflation for each reagion we see that Sub_Sahara has the most outliers but East Asia and Pacific has a larger quartile range. This is interesting because after research there has been a big push for maize farming in asia. After the Philippines’ success with genetically modified corn. Vietnam and Indonesia we close to follow. (http://www.thehindubusinessline.com/economy/agri-business/south-east-asia-could-be-corn-hub-for-asia-pacific/article4140604.ece). The reason fo the push is an increase in domestic animal feed demand. This instability of price is likely caused by the shift from imported maize to local maize.
Inflation Country Box
***add countries with high inflation
The third food in the analysis is Sorghum. Which is a cereal grain and is the fifth most important cereal crop in the world, largely because of its natural drought tolerance and versatility as food, feed and fuel (https://wholegrainscouncil.org/whole-grains-101/easy-ways-enjoy-whole-grains/grain-month-calendar/sorghum-june-grain-month). For these reasons it is mostly grown in Africa and Australia. For this data set we most see it in Africa.
Very similar to maize we see a significant spike from 2002 to 2003 which was a drought in Southern Africa. Although sorghum can grown in dry climates it is still valnerable to droughts.
As far as the inflation of sorghum there is a fairly large distrobution meaning prices fluctuate often.
Once again a vast majorirty of the data is from Sub-Saharan Africa. Also the extremely low prices in Latin America & The Caribbean. Which is interesting because sorghum production is not very large in this region. Though for this dateset the region is represented by one country, Hondurous. Additionally for Middle East and North Africa there is only data from one country so the only true representation of a region is for Sub-Sahrran Africa.
Since the only real representation of a region is Sub-Sahrran Africa this chart is accurate
Once again the countires with the highest prices are South Sudan, Nigeria and Guinea-Bissau.
**Look at why Sudan Ethiopia and Camaroon are unstable
The next food for analysis is beans. Which similar to rice is a more internaional food soo there is much more diversity in the data.
Here we can see that there has been a dramatic increase in price over the last few years. Additionaly it does not look like the price was affected by draught in Southern Africa. The spike you see in 2001 is caused by a few high prices from Guatamla.
There is still an overall increase in the price of beans across all regions.
***europe is wrong for some reason
Prices of beans vary greatly from region to region but Sub-Saharan Africa actually has the ceaper prices for means
Like regions the price varies greatly between countries. The countries with the highest prices are Turky, Congo, Nigeria, South Sudan and Algeria.
***Add findings about above countries
***Why Timor-Leste and Malawi have unstable prices
The finall food for analysis is another ceral grain. Millet, which is a most common in Asia and Africa with 97% of millet production in developing countries. For this dataset we only have prices for Africa.
Over the last few years millet has had a lot of fluctuation in price but overall the average has stayed around $0.0010.
From country to country there is not to much variation in price. The most expensive countries once again are Nigeria and Guinea-Bissae
As fore inflation every countrie has a normal distrobution but in Chad there is a much wider distrobution. Chad is ranked as one of the poorest nations in the world with with 55 percent of its 11.2 million citizens living below the poverty line and 36 percent living in extreme poverty (World Bank). Additonally in the 2010 UNDP Human Development Index which measueres a countries standard of living, Chad ranks 163th out of 169. These reason would explain food insecurities like famine and inflation.
Inflation Country Box